ar X iv : 1 10 3 . 08 90 v 2 [ cs . L G ] 4 M ay 2 01 3 EFFICIENT MULTI - TEMPLATE LEARNING FOR STRUCTURED PREDICTION
نویسنده
چکیده
Conditional random field (CRF) and Structural Support Vector Machine (Structural SVM) are two state-of-the-art methods for structured prediction which captures the interdependencies among output variables. The success of these methods is attributed to the fact that their discriminative models are able to account for overlapping features on the whole input observations. These features are usually generated by applying a given set of templates on labeled data, but improper templates may lead to degraded performance. To alleviate this issue, in this paper, we propose a novel multiple template learning paradigm to learn structured prediction and the importance of each template simultaneously, so that hundreds of arbitrary templates could be added into the learning model without caution. This paradigm can be formulated as a special multiple kernel learning problem with exponential number of constraints. Then we introduce an efficient cutting plane algorithm to solve this problem in the primal, and its convergence is presented. We also evaluate the proposed learning paradigm on two widely-studied structured prediction tasks, i.e. sequence labeling and dependency parsing. Extensive experimental results show that the proposed method outperforms CRFs and Structural SVMs due to exploiting the importance of each template. Our complexity analysis and empirical results also show that our proposed method is more efficient than OnlineMKL on very sparse and high-dimensional data. We further extend this paradigm for structured prediction using generalized p-block norm regularization with p > 1, and experiments show competitive performances when p ∈ [1, 2). Structured prediction [18, 29, 33] has been successfully applied to the problems with strong interdependencies among the output variables. In the realm of Natural Language Processing (NLP), various tasks are formulated into structured prediction problems. A typical example is part-of-speech tagging which assigns a specific partof-speech tag to each token of an input sentence. The tag of one token is strongly correlated with the tags of its neighbors under the linear chain dependencies [18, 33]. More complicated structured output dependencies could be trees or graphs, such as Context-Free Grammar (CFG) [33], dependency parsing tree [23, 24], noun phrase coreference [13], and factor graph for relation extraction [40]. Note that there exist exact inference methods for sequences and trees. For the tasks with general output structures (eg., the pairwise fully connected undirected graph) the exact inference problem is intractable. In such cases, approximate inference is usually pursued to obtain an approximate solution [14]. The major advantage of structured prediction models such as Conditional Random Fields (CRFs) [18] and Structural Support Vector Machines (Structural SVMs) Qi Mao and Ivor W. Tsang are with School of Computer Engineering, Nanyang Technological University, Singapore 639798, e-mail {QMAO1,IvorTsang}@ntu.edu.sg.
منابع مشابه
Random Fourier Features For Operator-Valued Kernels
Devoted to multi-task learning and structured output learning, operator-valued kernels provide a flexible tool to build vector-valued functions in the context of Reproducing Kernel Hilbert Spaces. To scale up these methods, we extend the celebrated Random Fourier Feature methodology to get an approximation of operatorvalued kernels. We propose a general principle for Operator-valued Random Four...
متن کاملSensitivity Properties of Intermittent Control
The sensitivity properties of intermittent control are analysed and the conditions for a limit cycle derived theoretically and verified by simulation. 1 ar X iv :1 70 5. 08 22 8v 1 [ cs .S Y ] 2 2 M ay 2 01 7
متن کاملar X iv : h ep - p h / 07 01 08 5 v 3 1 5 M ay 2 00 7 Light - cone sum rules for the Nγ ∆ transitions for real photons
We examine the radiative ∆ → γN transition at the real photon point Q 2 = 0 using the framework of light-cone QCD sum rules. In particular, the sum rules for the transition form factors G M (0) and R EM are determined up to twist 4. The result for G M (0) agrees with experiment within 10% accuracy. The agreement for R EM is also reasonable. In addition, we derive new light-cone sum rules for th...
متن کاملAggregating Algorithm for Prediction of Packs
This paper formulates the protocol for prediction of packs, which a special case of prediction under delayed feedback. Under this protocol, the learner must make a few predictions without seeing the outcomes and then the outcomes are revealed. We develop the theory of prediction with expert advice for packs. By applying Vovk’s Aggregating Algorithm to this problem we obtain a number of algorith...
متن کاملMulti-task Learning for Structured Output Prediction
Facial landmark detection is an important step for many perception tasks. In this paper, we address facial landmark detection as a structured output regression problem, where we exploit the strong dependencies that lie between the facial landmarks. For this, we propose a generic multi-task regression framework for structured output problems. The learning of the output structure is achieved thro...
متن کامل